PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes

نویسندگان

  • Arne Sahm
  • Martin Bens
  • Matthias Platzer
  • Karol Szafranski
چکیده

Many comparative genomics studies aim to find the genetic basis of species-specific phenotypic traits. A prevailing strategy is to search genome-wide for genes that evolved under positive selection based on the non-synonymous to synonymous substitution ratio. However, incongruent results largely due to high false positive rates indicate the need for standardization of quality criteria and software tools. Main challenges are the ortholog and isoform assignment, the high sensitivity of the statistical models to alignment errors and the imperative to parallelize large parts of the software. We developed the software tool PosiGene that (i) detects positively selected genes (PSGs) on genome-scale, (ii) allows analysis of specific evolutionary branches, (iii) can be used in arbitrary species contexts and (iv) offers visualization of the results for further manual validation and biological interpretation. We exemplify PosiGene's performance using simulated and real data. In the simulated data approach, we determined a false positive rate <1%. With real data, we found that 68.4% of the PSGs detected by PosiGene, were shared by at least one previous study that used the same set of species. PosiGene is a user-friendly, reliable tool for reproducible genome-wide identification of PSGs and freely available at https://github.com/gengit/PosiGene.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

O-36: Genome Haplotyping and Detection of Meiotic Homologous Recombination Sites in Single Cells, A Generic Method for Preimplantation Genetic Diagnosis

Background: Haplotyping is invaluable not only to identify genetic variants underlying a disease or trait, but also to study evolution and population history as well as meiotic and mitotic recombination processes. Current genome-wide haplotyping methods rely on genomic DNA that is extracted from a large number of cells. Thus far random allele drop out and preferential amplification artifacts of...

متن کامل

Comparison between conventional PCR and PCR - ELISA for detection of Brucella melitensis

Molecular detection techniques are believed to be key tools for both prevention and treatment follow up of brucellosis within live stock and human beings. Consequently rapid, reliable, easy to perform and automated systems for Brucella detection are urgently needed to allow early diagnosis and adequate antibiotic therapy in time. Brucellosis is a worldwide re-emerging zoonosis causing high econ...

متن کامل

Bioinformatics Genome-Wide Characterization of the WRKY Gene Family in Sorghum bicolor

The WRKY gene family encodes a large group of transcription factors that regulate genes involved in plant response to biotic and abiotic stresses. Sorghum is a notable grain and forage crop in semi-arid regions because of its unusual tolerance against hot and dry environments. We identified a set of 85 WRKY genes in the S. bicolor genome and classified them into three groups (I–III). Among the ...

متن کامل

Genome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis

Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...

متن کامل

In Silico Genome-Wide Screening for TnrA-Regulated Genes of Bacillus clausii

Bacillus clausii TnrA transcription factor is required for global nitrogen regulation. In order to obtain anoverview of gene regulation by TnrA in B. clausii KSMK16, the entire genome of B. clausii was screened forthe consensus sequence, 5’-TGTNAN7TNACA-3’ known as the TnrA box, and 13 transcription units werefound containing a putative TnrA box. The TnrA targets identified in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2017